#spurious rewards28/05/2025
Surprising Math Reasoning Gains from Incorrect and Random Rewards in Qwen2.5-Math
Qwen2.5-Math models improve math reasoning significantly even when trained with incorrect or random reward signals, highlighting unique reinforcement learning dynamics not seen in other models.